AITopics | multi-object representation learning

Collaborating Authors

multi-object representation learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Object Representation Learning via Feature Connectivity and Object-Centric Regularization

Neural Information Processing SystemsDec-26-2025, 16:05:30 GMT

Discovering object-centric representations from images has the potential to greatly improve the robustness, sample efficiency and interpretability of machine learning algorithms. Current works on multi-object images typically follow a generative approach that optimizes for input reconstruction and fail to scale to real-world datasets despite significant increases in model capacity. We address this limitation by proposing a novel method that leverages feature connectivity to cluster neighboring pixels likely to belong to the same object. We further design two object-centric regularization terms to refine object representations in the latent space, enabling our approach to scale to complex real-world images. Experimental results on simulated, real-world, complex texture and common object images demonstrate a substantial improvement in the quality of discovered objects compared to state-of-the-art methods, as well as the sample efficiency and generalizability of our approach. We also show that the discovered object-centric representations can accurately predict key object properties in downstream tasks, highlighting the potential of our method to advance the field of multi-object representation learning.

feature connectivity and object-centric regularization, multi-object representation learning, name change, (3 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Object Representation Learning via Feature Connectivity and Object-Centric Regularization

Neural Information Processing SystemsJan-19-2025, 20:46:26 GMT

feature connectivity and object-centric regularization, multi-object representation learning, object-centric representation, (1 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning

Komatsu, Takayuki, Ohmura, Yoshiyuki, Kuniyoshi, Yasuo

arXiv.org Artificial IntelligenceOct-4-2023

Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individual objects. Additionally, most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE). Therefore, it is not clear whether VAE regularization contributes to appropriate object segmentation. To elucidate the mechanism of object segmentation in multi-object representation learning, we conducted an ablation study on MONet, which is a typical method. MONet represents multiple objects using pairs that consist of an attention mask and the latent vector corresponding to the attention mask. Each latent vector is encoded from the input image and attention mask. Then, the component image and attention mask are decoded from each latent vector. The loss function of MONet consists of 1) the sum of reconstruction losses between the input image and decoded component image, 2) the VAE regularization loss of the latent vector, and 3) the reconstruction loss of the attention mask to explicitly encode shape information. We conducted an ablation study on these three loss functions to investigate the effect on segmentation performance. Our results showed that the VAE regularization loss did not affect segmentation performance and the others losses did affect it. Based on this result, we hypothesize that it is important to maximize the attention mask of the image region best represented by a single latent vector corresponding to the attention mask. We confirmed this hypothesis by evaluating a new loss function with the same mechanism as the hypothesis.

ablation study, multi-object representation learning, object segmentation, (2 more...)

arXiv.org Artificial Intelligence

2310.03273

Genre: Research Report > New Finding (0.73)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Multi-Object Representation Learning with Iterative Variational Inference

Greff, Klaus, Kaufmann, Raphaël Lopez, Kabra, Rishab, Watters, Nick, Burgess, Chris, Zoran, Daniel, Matthey, Loic, Botvinick, Matthew, Lerchner, Alexander

arXiv.org Machine LearningMar-1-2019

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns -- without supervision -- to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

1903.0045

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Switzerland (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback